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Abstract. In this paper we describe a statistical procedure to account for 
differences in grading practices from one course to another. The goal is to 
define a course "inflatedness" and a student "aptitude" that best captures 
ones intuitive notions of these concepts. 

1. Introduction 

Course assessment and grading policy are topics of great interest to most stu- 
dents. Mathematical models that address inherent unfairness in the assessment 
process provide an excellent example of regression that can be taught in under- 
graduate statistics and/or optimization courses. In fact, one of us (Scharf) was 
a junior contemplating what would make an interesting senior thesis and after a 
casual dinner conversation with classmates came up with the idea that a statisti- 
cal method to adjust student grade-point averages according to the difficulty of the 
courses taken could lead to a very interesting thesis. This article, while it highlights 
a different statistical approach than the one originally proposed, is an outgrowth 
of that thesis. 

Suppose a student takes both course X and course Y and gets a higher grade in 
course X than in course Y. Based on just one student, it is likely that the student 
simply has more aptitude for the material in course X than for the material in course 
Y. But, if most students who took both courses X and Y got a better grade in course 
X than in course Y, then one begins to think that course X simply employed a more 
inflated grading scheme. 

Consider for example, a school with only four students: John, Paul, George, and 
Ringo. Suppose that this school only offers six different courses from which the 
students select four to take. The students made their selections, took the courses, 
and we now have grading information as shown in Table [T] From this table, we see 
that George and Paul have received the same grades (in different courses) and so 
their grade-point averages (GPA's) are the same. Furthermore, John's grades are 
only slightly better and Ringo's grades only slightly worse than average. But, it is 
also clear that the Math class gave lower grades than the Economics course. In fact, 
there is a linear progression in grade-inflation as one progresses from left to right 
across the table. Taking this into account, it would seem that John took "harder" 
courses than Paul (the quotes are to emphasize that a course that gives lower grades 
is not necessarily more difficult even though we shall use such language throughout 
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MAT CHE ANT REL POL ECO 



John 
Paul 



B- 
C+ 



B 

B- 
C+ 



B+ 



A- 
B+ 



A- 
B+ 
B 



George 
Ringo 



B- 
C+ 



B 



A- 
B+ 



Table 1. Grading data from Bcatlc University. The six courses 
are Math (MAT), Chemical Engineering (CHE), Anthropology 
(ANT), Religion (REL), Politics (POL), and Economics (ECO). 



this paper), who took harder courses than George, who took harder courses than 
Ringo. Hence, GPA does not tell an unbiased story. John did the best in all of his 
courses, in many cases by a wide margin. Ringo, on the other hand, did the worst 
in all of his classes, again by a wide margin. It is clear that John is a much better 
student than Ringo — better to a degree that is not reflected in their GPA's. 

Our aim is to develop a model that can be used to infer automatically the sort 
of conclusions that we have just drawn for this small example. Of course, one must 
consider the simplest suggestion of just computing averages within each course. 
Clearly, in Table [l] the Math course gave grades a full letter grade lower than 
the Econ course. One could argue that that is all one needs — just correct using 
average grades within each course. But, one can easily modify the simple example 
shown in Table [T] to make all the courses have the same average grade and all of 
the students have the same GPA but for which there is an obvious trend in the true 
aptitude of the students. Table [2] shows one rather contrived way to do this (using 
an unbounded list of courses and students) . 

Finally, the model must be computationally tractable so that it can be run for a 
school with thousands of students taking dozens of courses (over four years) selected 
from a catalogue of hundreds of courses. 



We assume that there are m students and n courses. The data consists of the 
grades for all courses taught. For each course, we assume that we have grading 
data for every student who took that course. But, we do not assume that every 
student takes every course offered. In fact, we assume quite the opposite, namely, 
that each student only takes a small sample of the complete suite of courses offered. 

We assume that each student has an aptitude^] yn, i = 1,2, ... ,m, which is un- 
known to us and which we wish to estimate, and that each course has an inflatedness 
Vj, j = 1,2, ... , n, which is also unknown to us and also of interest to estimate. 
We assume that each grade X+j can be approximated as the sum of the student's 
aptitude plus the course's inflatedness: 

(1) Xij = m + Vj + eij, e Q 



Several colleagues have pointed out the obvious fact that aptitude varies from subject to 
subject. We are not trying to capture this variation. In this paper, we consider "aptitude" to be 
a synonym for "modified GPA" — a one-dimensional parameter that could be used to determine 
class rank, awards, etc. 



2. The Model 
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• • • MAT 


CHE 


ANT 


REL 


POL 


ECO HIS 


John 


B- 


B 


B+ 


A- 






Paul 




B- 


B 


B+ 


A- 




George 






B- 


B 


B+ 


A- 


Ringo 








B 


B 


B+ A- 



Table 2. A school with an infinite number of students and an 
infinite selection of courses. Every student has the same GPA and 
every course has the same course average. Yet, John is smarter 
than Paul is smarter than George is smarter than Ringo and Math 
is harder than Chemical Engineering is harder than Anthropology 
etc. 



where Q represents the set of student-course pairs for which we have a grade 
(i.e., student i actually took course j). And, of course, the Cjj's are the "errors" 
one needs to add to make the approximation an equality. These errors reflect both 
the randomness associated with how any student might perform in any particular 
course and also a systematic deviation between the student's overall aptitude and 
his/her subject-specific aptitude for the material in the particular course. 

Ideally, grades should reflect aptitude. Hence, we would like to say that a student 
with a B-level aptitude should be expected to get B-level grades. In other words, 
inflatedness should measure deviations, both positive (for courses with high grades) 
and negative (for courses with low grades), around some neutral average grade. In 
other words, we wish to impose the added constraint that 

(2) E^ = °- 

j 

This, of course, is by choice. We need some sort of normalization. Without one, we 
could add an arbitrary constant to every fa and subtract the same constant from 
every i/j without changing any of the e^-'s. 

Our aim is to find the best "fit" to the data. That is, we wish to choose the fa's 
and the Vj's in such a manner as to make the e^'s as small as possible. To do this, 
we minimize the sum of the squares of the e^ 's: 

(3) minimize e?- 

subject to Xij = fa + Vj + €ij for (i, j) e Q 

E^ = °- 

3 

Of course, we could minimize the sum of the absolute values instead of the sum 
of the squares. Generally speaking, sample means minimize the sum of squares 
whereas sample medians minimize the sum of absolute deviations. Medians are 
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MAT 


CHE 


ANT 


REL 


POL 


ECO 


GPA 


//., 


John 


B- 


B 


B+ 


A- 






3.18 


3.51 


Paul 


C+ 


B- 




B+ 


A- 




3.00 


3.16 


George 




C+ 


B- 




B+ 


A- 


3.00 


2.84 


Ringo 






C+ 


B- 


B 


B+ 


2.83 


2.49 


Avg. 


2.50 


2.70 


2.77 


3.23 


3.33 


3.50 






v i 


-0.84 


-0.50 


-0.18 


+0.18 


+0.50 


+0.84 







Table 3. The same example as shown in Table [T] with aptitude 
fii and inflatedness Uj shown alongside row and column grade av- 
erages. 



more robust estimators of centrality than means but it is easier to provide confidence 
intervals for means. For the latter reason, we will stick with summing squares for 
most of this paper. 

Table [3] shows the output for Beatle University. The student aptitude metrics 
clearly show that John is the smartest Beatle. Also, while average grades in the 
courses correctly show that Math is the most difficult and Econ is the easiest, the 
inflatedness metric expands on the disparity. For example, based on averages, a 
student might think that the difference between Math and Econ is just one full 
letter grade but the inflatedness metric suggests the difference is more like one and 
two thirds letter grades (1.68 to be precise). 

We will return to more examples later in Section [5] including one example using 
real- world data. But, first, let us analyze our model. 

3. Least Squares 

Statistical estimates of underlying unobserved fundamental quantities have little 
value without an associated estimate for an error in the estimation. For general least 
squares models, it is well understood how to produce such error bars. Nonetheless, 
it is instructive to derive the formulae from scratch in this particular context, at 
least in the particular case where we assume, unrealistically, that every student 
takes every course. 

3.1. Estimating Means. To make a connection with utterly standard and ele- 
mentary concepts, let us assume for the moment that we simply want to estimate 
some underlying single parameter \x based on n observations Xj, j = 1, 2, . . . , n. In 
other words, we assume that 

Xj = H + €j 

where the e^-'s are taken to be independent, identically-distributed random variables 
with mean zero and variance a 2 . The parameter /i is unknown and to be estimated. 
The variance a 2 is also unknown and must be estimated as well. The least squares 
estimator ft for /i is that value of /j, that minimizes 

fw = i E & ^ ■ 

3 
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Taking the derivative and setting it equal to zero, one gets that u is just the sample 
mean: 



Since the Xj 's are independent and have variance a 2 , it follows that /j, has variance 
a 2 /n. The function / evaluated at p, provides a good estimator for a 2 : 

2 



3.2. Every Student Takes Every Course. Now let's consider the problem of 
estimating aptitude and inflatedness from grade data. But, in an attempt to keep 
things simple, let us assume that every student takes every course. We have m 
students and n courses and therefore the set Q consists of mn pairs for which we 
have grades. As before, let / denote the function to be minimized: 



/(/Xl,...,/i TO ,Z/l,...,I/„) = 



mn 



As mentioned earlier, there is an ambiguity in the model — we could add an arbitrary 
constant to every aptitude and subtract that same constant from every inflatedness 
and the function / would be unchanged. In a previous section, we addressed this 
ambiguity by imposing one extra constraint, namely, that the sum of the Vj's be 
zero. We could do that here, introducing then the associated Lagrange multiplier, 
forming the Lagrangian, and solving the problem that way. But, it is such a simple 
constraint that we prefer to introduce it in a less formal manner as we go. In doing 
so, we hope that the analysis will be more transparent, not less. 

Taking derivatives with respect to each of the variables and setting these deriva- 
tives to zero, we get the following system of equations for the estimators fij's and 
Pi's: 



Mi 



j 

D i = ^ ( X v ~ ^ ■ 

i 

Here, it is convenient to switch to matrix-vector notation. So, letting 



Mi 

M2 



V-2 



and 



X 



An A12 
A21 A22 



X m \ X, 



m2 



X\ n 
Xm 

Xmn. 
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we can rewrite our optimality equations as 

p = — (Xe — eve) 
n 

(4) v = -(e T X-e T pe T ), 

m 

where e denotes a column vector of either morn ones, the dimension being obvious 
from context. Substituting the second equation into the first, we can isolate p: 

1 / 1 / rri rp rp \ 

u= — \Xe ele X — e \xe )e 

n \ m 

Collecting terms involving p on the left side, the remaining terms on the right-hand 
side, and using the fact that e T e = n, we get 

I _± ee A; 1= ( I _L ee A (l Xe 

m J \ m J \n 
If the matrix / — ee 1 jm were nonsingular, we would at this point conclude that 

(5) p=-Xe. 

n 

But, the matrix is singular with rank deficiency one (e is in the null space). So, 
there are other choices for p. Indeed, there is a one-parameter family of choices 
(any p for which p — (l/n)Xe is in the null space of I — ee T /m). Nonetheless, we 
choose to let p be given by ^ and as we shall now show this choice guarantees 
that the sum of the i>j 's vanishes as we have required. Indeed, plugging ^ into 
@, we get 

(6) D=- (e T X - -e T Xee T 

m V n 



and therefore that 



' e T X - -e T Xee T ) e = — (e T Xe - e T Xe) = 0, 



771 \ 77 



the second equality following from the fact that e e = n. 

From ([5]) and Q, we see that the pi's and the Pj's are just row and column 
sample means with one of them shifted by the overall mean. 

Reverting back to explicit component notation, ([5]) and (|6| can be written as 

pi = -y~)Xij, i = l,2,...,m, 
77 

3 

Dj = — y2x i:j — — y2x i:i , j = 1,2,..., 77. 

777 77777 ^— ' 

From the first formula, we immediately see that 

c 2 



(7) var(/Xj) = — 

77 



Computing the variance of the Uj's is a little more tedious but entirely routine. The 
result is 

(8) var(^) = — ( 1 - - 

777 \ 77 
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Finally, we need an estimate of a 2 . As before, we can use the objective function 
/ evaluated at the optimal values for the /Vs an d Vj's: 

v 2 ~ fifix, Pm, v\, . . . , P„) = 52 ( X v - Pi" Vjf ■ 

3.3. Students Take Selected Courses. Now suppose that each student takes 
only a small subset of the courses offered. For each student i, let J(i) denote the 
set of courses taken by student i. Similarly, for each course j, let denote the 
set of students that took course j. 

The least-squares loss function is now given by 

f([l 1 ,...,[l m ,V l ,...,V n ) = — 52 ( X H ~ - v jf ) 

where N denotes the cardinality of the grade-set Q. Again, we differentiate and set 
to zero. This time we get 



(9) = iXii-Vj) i=l,2,. 

(10) Vj = — V (X^-ft) i = 1, 2, 

TT7 ■ t 



71, 

3 iez(i) 



where n, denotes the cardinality of 3{i) and mj denotes the cardinality of 
Substituting (10) into ([9]), we get 



This is a set of m equations in m unknowns. If there is adequate diversity in student 
course selections so that every course indirectly is connected to every other course, 
then one would expect this system to have rank m — 1 leaving only one dimensional 
ambiguity in the equations. Inspired by the simplicity of the results in the previous 
section, we can hope that again simple sample means will provide one solution to 
this system of equations: 

7 1 x -, 

Mi = — X v i = l,2,...,m. 

In order for this to be correct, we need to have 



£ 4 E U>~ - 



= 0. 



E 

Unfortunately, there is no particular reason for this to be true. And, as we saw 
with the second example in the introduction, it is possible for the sample means 
to be all the same even when there is a big difference in course grade inflatedness 
and/or in student aptitude. The model detects such differences. 

Even though it appears there is no simple formula for the solution to the least- 
squares formulation of our problem, modern statistical and/or optimization soft- 
ware can solve these problems numerically without difficulty even when the data 
sets are very large. 
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Also, the fact that we have not been able to give a simple concrete formula for 
the j2i's and the Dj's makes it impossible to give a simple concrete formula for the 
variance of these random variables. Nonetheless, we can infer from the concrete 
results obtained before that one should first estimate a 2 using the optimal value of 
the objective function as an estimate of this quantity and then the variance of the 
individual /[tj's and Dj's can be approximated simply by dividing by the number of 
grades reflected in that aggregation (that is, either rij or rrij). 

4. Least Absolute Deviations 

In this section, we consider a robust model in which we minimize the sum of 
the absolute deviations. To motivate what follows, we start with a brief review of 
medians. 

4.1. Medians. As when we discussed means, let us assume for the moment that 
we simply want to estimate some underlying single parameter /j based on n obser- 
vations Xj, j = 1, 2, . . . , n. In other words, we assume that 

x i = M + e J 

where the e/s are taken to be independent, identically-distributed random variables 
with mean zero and variance a 2 . The least absolute deviation estimator ju for [i is 
the value of a that minimizes 

3 

Taking the derivative and setting it equal to zero, one gets that ji must satisfy 

sga(Xj - /i) = 0, 

3 

which is clearly solved by setting /x equal to the median of the Xj 's (so that half of 
the sgn's are +1 and the other half are —1. 

4.2. Every Student Takes Every Course. Now let's consider the problem of 
estimating student aptitude and course inflatcdness from grade data. As before, we 
start by assuming that every student takes every course. Once again, let / denote 
the function to be minimized: 

f(fH, . . .,H m , V U . ..,!>„) = — y~] \Xij - m - v 3 \ . 

Taking derivatives with respect to each of the variables and setting these deriva- 
tives to zero, we get the following system of equations for the estimators juj's and 

J2j s E n ( x ij - V-i ~ "j) = i = 1, 2, . . . , m 

E 4 sgn(Xjj - fti - Vj) = j = 1, 2, ... ,n. 

Unlike before, there seems to be no simple description of the solution to this prob- 
lem. But, we can give an algorithm that should converge quickly to the solution. 
Specifically, initialize 

Vj = 0, .7 = 1,2, . ..,7i 

juj = median{Xjj | j = 1, 2, . . . n}, i = l,2,...,m. 
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Then, iterate the following until there is no change from one iteration to the next: 

Vj = median{Xjj — /Ij \ i = 1, 2, . . . , to}, j = 1,2, . . . , n 
//, = median{Xij — Vj \ j = 1, 2, . . . , n}, i = 1, 2, . . . , m. 

This algorithm is unlikely to converge to a solution that satisfies 53 • = but, 
given the initialization, it should come close to this point. 



4.3. Students Take Selected Courses. Finally, let us return to the general case 
in which each student takes only a small subset of the courses offered. The problem 
is to minimize the sum of the absolute values of the 's: 

(11) minimize \ e ij\ 

subject to Xij = fit + Vj + Eij for (i, j) 6 Q 



5> = o. 



It is easy to reformulate this model as a linear programming (LP) problem: 
minimize tij 

subject to — < X,^ — jii — Vj < for (i, j) e Q 



3 

Such linear programming problems can be solved easily. In the next section we give 
some examples and we compare the results from least squares formulations with 
those from the least absolute deviations model. 



5. Examples 

Finally, we consider a few specific examples including one based on real data. 

5.1. Truncated Example. The example shown in Table[2]was contrived in order 
to make a point. In particular, it had an infinite number of students and courses. 
In Table |4j we show a truncated version consisting of eight students taking courses 
from a school offering eight courses. Each student takes three to five courses. As 
with the untruncated version, it is clear that the students are listed in order of 
their aptitude with the best student at the top. However, student GPA's hardly 
reflect the obvious trend in aptitude. The fi^s computed by our model make the 
difference in aptitude much more apparent. Similarly, average grades given in the 
courses show a small trend in the correct direction but they hardly account for the 
rather obvious overall trend in course inflatedness as one scans from left to right 
across the table. The i/j's do a much better job of identifying course inflatedness. 

It is interesting to point out that the least squares and the least absolute devia- 
tion models both give the same results for this particular example. 
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MAT 


CHE 


ANT 


REL 


POL 


ECO 


HIS 


soc 


GPA 


fit 


Sean 


B + 


A 

A— 


A 

A 












3.67 


4.50 


Yoko 


B 


B+ 


A- 


A 










3.50 


4.17 


John 


B- 


B 


B+ 


A- 


A 








3.33 


3.83 


Paul 




B- 


B 


B+ 


A- 


A 






3.33 


3.50 


George 






B- 


B 


B+ 


A- 


A 




3.33 


3.17 


Ringo 








B- 


B 


B+ 


A- 


A 


3.33 


2.83 


Jane 










B- 


B 


B+ 


A- 


3.17 


2.50 


Heather 












B- 


B 


B+ 


3.00 


2.17 


Avg. 

V 3 


3.00 
-1.17 


3.17 
-0.83 


3.33 
-0.50 


3.33 
-0.17 


3.33 
+0.17 


3.33 
+0.50 


3.50 
+0.83 


3.67 
+1.17 





Table 4. Truncated Example. This is the same as the example 
shown in Table [2] but it has been truncated to represent a school 
with eight students and eight courses. Each student took three to 
five courses with grades as shown. As with the untruncated version, 
there are clear trends in student aptitude and course inflatcdness, 
which our model correctly uncovers. 



5.2. Circulant Example. This example is almost the same as the truncated ex- 
ample in the previous subsection. Here, however, we have added two courses to 
Sean's schedule and to Heather's schedule and we have added one course to Yoko's 
schedule and to Jane's schedule. The result is a table of grades that has a circulant 
structure. Now, the trends that were clearly apparent in the truncated example are 
completely gone. In this example, both student GPA and the fj^'s reflect the lack 
of any differentiation among the students. Similarly, the course averages and the 
ia-'s both show that all courses are curved the same. 
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MAT 


CHE 


ANT 


REL 


POL 


ECO 


HIS 


SOC 


GPA 




Sean 


D-r 


A 


A 








r> — 


13 


3.33 


3.33 


Yoko 


B 


B+ 


A- 


A 








B- 


3.33 


3.33 


John 


B- 


B 


B+ 


A- 


A 








3.33 


3.33 


Paul 




B- 


B 


B+ 


A- 


A 






3.33 


3.33 


George 






B- 


B 


B+ 


A- 


A 




3.33 


3.33 


Ringo 








B- 


B 


B+ 


A- 


A 


3.33 


3.33 


Jane 


A 








B- 


B 


B+ 


A- 


3.33 


3.33 


Heather 


A- 


A 








B- 


B 


B+ 


3.33 


3.33 


Avg. 

V 3 


3.33 
0.00 


3.33 
0.00 


3.33 
0.00 


3.33 
0.00 


3.33 
0.00 


3.33 
0.00 


3.33 
0.00 


3.33 
0.00 





Table 5. Circulant Example. This example is the same as the 
previous one except that there are six more grades filling out the 
matrix into a circulant form. Now the trends are gone. Every 
student has a B+ average and every course is curved to a B+. Our 
model correctly assigns every course an easiness adjustment of 0.00 
leaving every student's "corrected" GPA equal to his/her original 
GPA. 
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1 


F001090 


3.7 


5 


F002046 


3 


8 


S008811 


2.7 


1 


F004148 


1.7 


5 


F005976 


3.7 


8 


a010952 


3.7 


1 


F006665 


2 


5 


F007285 


2.3 


8 


a010973 


1.7 


1 


F010449 


3 


5 


F008991 


4 


9 


F002614 


2.7 


1 


S009167 


3.7 


5 


F010762 


4 


9 


t 006664 


2 


1 


S009571 


2 


5 


S001380 


3.7 


9 


F008144 


1.7 


1 


S010994 


2.7 


5 


S004153 


2.3 


9 


F008832 


2.7 


2 


F003387 


3 


5 


S005842 


4 


9 


r 010542 


3 


2 


F009193 


2.7 


5 


S008310 


4 


9 


a001065 


3.3 


2 


F010693 


3 


6 


F001400 


2.7 


9 


n AA 1 f A i~\ 

S001542 


2 


2 


F010813 


2.7 


6 


F004647 


3.7 


9 


S004398 


2.3 


2 


S001093 


1 


6 


F006787 


3.3 


9 


S004399 


2.3 


2 


S003408 


3 


6 


F009999 


2.7 


10 


F008991 


2 


2 


S005302 


2 


6 


S003424 


3 


10 


h 009582 


2.3 


3 


F003769 


3 


6 


S003952 


3 


i n 




A 


3 


F004893 


3.7 


6 


S009187 


3.7 


10 


S002463 


4 


3 


F004896 


3.3 


6 


S010953 


3.7 


10 


S004186 


4 


3 


S004172 


1.7 


7 


F005979 


3.3 


10 


S004398 


2 


4 


F000613 


3.7 


7 


F007230 


3.3 


11 


F001090 


3.3 


4 


F001381 


2.7 


7 


F010437 


3.3 


11 


F001109 


3.7 


4 


F004140 


3 


7 


S006804 


4 


11 


F003243 


1.7 


4 


F005588 


3.7 


7 


S010960 


4 


11 


F005558 


2.3 


4 


S000185 


3 


8 


F001064 


4 


11 


S002625 


2.3 


4 
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Table 6. One semester of data consisting of about 37000 grades 
given to about 5000 students. Each record consists of three data 
elements: the student id (encoded), the course id (also encoded), 
and the grade (converted from a letter grade to a numerical grade 
in the usual manner). 



5.3. Two Semesters of Real Data. The registrar at a private university in the 
northeast has given us a complete two-semester data set. There are about 5000 
students at this university each of whom takes four or five courses per semester from 
a selection of roughly 700 courses offered each semester. The data is encoded — we 
don't know the identity of any particular student. Nor can we tell which course is 
which. All of this has been pre-encoded by the registrar. But, the grades are real. 
A small snippet of the data is shown in Table [6] 

Table [7] shows a sample of the output from the least squares model. Table [8] 
shows a sample of the output from the least absolute deviations model. Comparing 
Tables [7] and |5J it is clear that the results are similar. 

Typical courses have between 10 and 100 students. For the larger courses, there 
seems to be an adequate amount of data to draw conclusions. Since, the data set 
only represents two semesters and most students take only four or five courses in 
a semester, one should not put too much credence in the aptitudes assigned to the 
students. But, a larger data set consisting of three or four years of data would 
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contain about 20 to 30 courses of grade data for each student. In such a case, one 
could imagine that the /ij's would be a pretty good indicator of student aptitude. 

6. Conclusions 

The fundamental data available to a registrar is grading data: the Xy's. In 
recent years, this data set has been used for two main purposes: (1) to assess 
student achievement, and (2) to assess course-by-course grade inflation. Student 
achievement is usually assessed by reporting on a transcript the student's GPA. A 
statistical justification for this is that the totality of all student CPA's is the simple 
least-squares solution to the following regression model: 

Xij = ^ + eij, € Q. 

At the same time, grade inflation is assessed by reporting average (or median) 
grades given in a course. The totality of average course grades is the least-squares 
solution to a "dual" regression model: 

= Vj + eij, G Q. 

It seems only natural that these two problems should be combined into one and 
that is exactly what we have proposed in this paper. 

Grade inflation, and what to do about it, has been discussed extensively in recent 
years. In this paper, we have described an analytical approach to disentangling the 
course-by-course differences in grading policies from underlying student aptitudes. 
If such a tool were to be widely adopted and student aptitude as defined by the 
models given in this paper were to become the accepted measure of student ac- 
complishment, then the issue of standardizing grading policies across a university 
becomes somewhat moot. 

Of course, there is still the important question of comparing grades from stu- 
dents across different universities, which is something professional schools, graduate 
schools, and employers must do routinely. Unfortunately, the model described here 
cannot address this difficult problem without a dataset in which students at di- 
vergent universities take common courses. Perhaps the only way to do that would 
be to make a huge model in which all high-school and university grading data are 
fed into one huge master program. If such data were ever made available, which is 
highly doubtful, such a problem would probably be too large to solve on today's 
computers. 

The models presented in this paper are good examples of least-squares and least- 
absolute deviations regression and can therefore be used as a pedagogical tool when 
teaching these topics in statistics and/or optimization courses. 

7. Further Reading 

There is, of course, prior literature on the general problem of assessment. Rasch's 
book [1] and the related paper [3] introduce, perhaps for the first time, the idea of 
representing a score as a function of the difference between ability and difficulty. 
Caulkins et al. [1] apply the idea specifically to the problem of adjusting grade- 
point averages. Johnson [2] introduced an alternative approach and compared it 
to the linear- adjustment models. More recently, the book [3] gives an extensive 
treatment on a number of models for adjusting for variations in course difficulty. 
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Table 7. A partial listing of the course inflatedness associated 
with the data partially shown in Table [6] The table shows in three 
columns the beginning and the end of a long table of data with 
three columns. The first column is the course id, the second col- 
umn is the inflatedness Vj , and the third column shows the course 
enrollment. In the interest of space, we show only some of the least 
inflated courses and some of the most inflated courses. It is inter- 
esting to note that, with the exception of a few very small classes 
(seminar and project courses), the inflatedness spans from about 
—0.45 to 0.55. In other words, a student can expect a plus/minus 
half-letter grade deviation from his/her "true" aptitude simply be- 
cause of differences in grading policies among some courses. 
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Table 8. A partial listing of the course inflatedness associated 
with the data partially shown in Table [6] as computed using the 
least absolute deviations model. 
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